Number Theory Meets Cache Locality – Efficient Implementation of a Small Prime FFT for the GNU Multiple Precision Arithmetic Library

نویسنده

  • Tommy Färnqvist
چکیده

When multiplying really large integer operands, the GNU Multiple Precision Arithmetic Library uses a method based on the Fast Fourier Transform. To make an algorithm execute quickly on a modern computer, data has to be available in the cache memory. If that is not the case, a large portion of the execution time will be spent accessing the main memory. It might pay off to perform much extra work to achieve good cache locality. In extreme cases, 500 primitive operations may be performed in the time of a single memory access. This report describes the implementation of a cache friendly variant of the Fast Fourier Transform and its application to integer multiplication. The variant uses arithmetic modulo primes near machine word-size. The multiplication method is shown to be competitive with its counterpart in version 4.1.4 of the GNU Multiple Precision Arithmetic Library for interesting platforms. Talteori möter cachelokalitet Effektiv implementation av småprimtals-FFT för GNU:s multiprecisionsaritmetikbibliotek

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Differential Fault Attacks and Countermeasures in Elliptic Curve Cryptography

In asymetric cryptography, Elliptic Curve Cryptography (ECC) is the fastest in term of computation and the strongest in term of security. It can be used in message encryption/decryption, digital signature or key exchange. ECC can be implemented in hard over binary field GF(2n) or in soft over prime field GF(p). This paper presents an efficient software implementation of ECC scalar multiplicatio...

متن کامل

A Highly Efficient Implementation of Multiple Precision Sparse Matrix-Vector Multiplication and Its Application to Product-type Krylov Subspace Methods

We evaluate the performance of the Krylov subspace method by using highly efficient multiple precision sparse matrix-vector multiplication (SpMV). BNCpack is our multiple precision numerical computation library based on MPFR/GMP, which is one of the most efficient arbitrary precision floating-point arithmetic libraries. However, it does not include functions that can manipulate multiple precisi...

متن کامل

Computing Mod Without Mod

Encryption algorithms are designed to be difficult to break without knowledge of the secrets or keys. To achieve this, the algorithms require the keys to be large, with some algorithms having a recommend size of 2048-bits or more. However most modern processors only support computation on 64-bits at a time. Therefore standard operations with large numbers are more complicated to implement. One ...

متن کامل

Blum Blum Shub on the GPU

Context. The cryptographically secure pseudo-random number generator Blum Blum Shub (BBS) is a simple algorithm with a strong security proof, however it requires very large numbers to be secure, which makes it computationally heavy. The Graphics Processing Unit (GPU) is a common vector processor originally dedicated to computer-game graphics, but has since been adapted to perform general-purpos...

متن کامل

Toom-Cook Multiplication: Some Theoretical and Practical Aspects

Toom-Cook multiprecision multiplication is a well-known multiprecision multiplication method, which can make use of multiprocessor systems. In this paper the Toom-Cook complexity is derived, some explicit proofs of the Toom-Cook interpolation method are given, the even-odd method for interpolation is explained, and certain aspects of a 32-bit C++ and assembler implementation, which is in develo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005